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Abstract 

Background: Copy Number Variations (CNVs) have been shown important in both normal phenotypic variability 
and disease susceptibility, and are increasingly accepted as another important source of genetic variation 
complementary to single nucleotide polymorphism (SNP). Comprehensive identification and cataloging of pig CNVs 
would be of benefit to the functional analyses of genome variation. 

Results: In this study, we performed a genome-wide CNV detection based on the Porcine SNP60 genotyping data 
of 474 pigs from three pure breed populations (Yorkshire, Landrace and Songliao Black) and one Durocx Erhualian 
crossbred population. A total of 382 CNV regions (CNVRs) across genome were identified, which cover 95.76Mb of 
the pig genome and correspond to 4.23% of the autosomal genome sequence. The length of these CNVRs ranged 
from 5.03 to 2,702.7kb with an average of 250.7kb, and the frequencies of them varied from 0.42 to 20.87%. These 
CNVRs contains 1468 annotated genes, which possess a great variety of molecular functions, making them a 
promising resource for exploring the genetic basis of phenotypic variation within and among breeds. To 
confirmation of these findings, 18 CNVRs representing different predicted status and frequencies were chosen for 
validation via quantitative real time PCR (qPCR). Accordingly, 12 (66.67%) of them was successfully confirmed. 

Conclusions: Our results demonstrated that currently available Porcine SNP60 BeadChip can be used to capture 
CNVs efficiently. Our study firstly provides a comprehensive map of copy number variation in the pig genome, 
which would be of help for understanding the pig genome and provide preliminary foundation for investigating 
the association between various phenotypes and CNVs. 
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Background 

Copy number variation (CNV) is defined as a segment of 
DNA that is lkb or larger and present at a variable copy 
number in comparison with a reference genome [1,2]. So 
far, CNV has gained considerable interests as a source of 
genetic variation in many species. Extensive studies have 
been performed to identify and map CNV in humans [1- 
3], model organisms [4-6] and domestic animals [7-11]. 
Compared with the most frequent SNP marker, CNVs 
cover wider genomic regions in terms of total bases 
involved and have potentially larger effects by changing 
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gene structure and dosage, alternating gene regulation, 
exposing recessive alleles and other mechanisms [12,13]. 
CNVs have been shown to be important in both normal 
phenotypic variability and disease susceptibility [1,13,14] 
and association studies of CNVs and diseases have be- 
come popular in human [15-17]. Additionally, in animals, 
phenotype variations caused by CNVs were also 
observed, for instance, the white coat phenotype in pigs 
caused by the copy number variation of the KIT gene 
[18,19] and the pea-comb phenotype in chickens caused 
by the copy number variation in intron 1 of the SOX5 
gene [20]. These demonstrate that CNVs can be consid- 
ered as promising markers for some economically im- 
portant traits or diseases in domestic animals. Thus, 
comprehensive identification and cataloging of CNVs will 
greatly benefit functional analyses of genome variation. 

Although pig is one of the most economically import- 
ant worldwide livestock as well as a suitable animal 
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model for human disease, few studies are focused on in- 
vestigating CNV in pig compared to other species [4- 
8,21,22]. So far, there are merely two studies on pig 
CNV detection reported. Fadista et al. [9] addressed the 
first account of CNV survey (37 CNVRs) among 12 
Duroc boars using a custom tiling oligonucleotide array 
CGH approach. Ramayo-Caldas et al. [10] identified 49 
CNVRs in 55 animals from an Iberian x Landrace cross 
using Porcine SNP60 BeadChips. Previous studies at 
genome scale suggest that CNVs comprise up to -12%, 
4% and 4.6% of human [2], dog[21] and cattle [8] genome 
sequence, respectively. Compared with abundance of 
CNVRs detected in other species, CNVs detected in pig 
is far from saturation. 

Currently, CNVs can be identified using different 
technological approaches. Two major platforms, i.e., 
comparative genomic hybridization (CGH) array and 
SNP genotyping array, were extensively compared by 
Redon et al. [2]. Although CGH array based approach 
has excellent performance in signal-to-noise ratios, the 
SNP genotyping array has the advantage of performing 
both genome-wide association studies (GWAS) and 
CNV detection [23]. CGH arrays report only relative sig- 
nal intensities, whereas SNP arrays collect normalized 
total signal intensity (Log R ratio - LRR) and allelic in- 
tensity ratios (B allele frequency - BAF) which represent 
overall copy numbers and allelic contrasts [23]. SNP 
arrays use less sample per experiment compared to 
CGH arrays, and it is a cost effective technique which 
allows users to increase the number of samples tested 
on a limited budget [24]. Nowadays, SNP arrays have 
been routinely used for CNV detection in human and 
other organisms [2,8,10,25], and manufacturers of SNP 
genotyping arrays have incorporated non-polymorphic 
markers into their SNP genotyping arrays to improve 
the coverage of SNP arrays for CNV analyses [26] . 

In the present study, using the PennCNV software 
[27], a genome-wide CNV detection based on the Por- 
cine SNP60 BeadChip was performed in a large sample 
of 474 pigs from four breed populations with different 
genetic background. Our study firstly provides a com- 
prehensive map of CNVs in the pig genome, which 
would be helpful for understanding the genomic vari- 
ation in the pig genome and provide preliminary founda- 
tion for investigating the association between various 
economically important phenotypes and CNVs. 

Results 

Genome-wide detection of CNVs 

Overall, 4,279 CNVs were assessed by PennCNV on 18 
pairs of autosomal chromosomes. The average number 
of CNVs per individual was 9.03. By aggregating overlap- 
ping CNVs, a total of 382 CNVRs (Additional file 1; 
Table SI) across genome were identified, which cover 



95.76Mb of the pig genome and correspond to 4.23% of 
the autosomal genome sequence. Among these CNVRs, 
we found 296 loss, 34 gain and 52 both (loss and gain 
within the same region) events. The length of these 
CNVRs ranged from 5.03 to 2,702.7kb with a mean of 
250.7kb and a median of 142. 9kb. The frequencies of 
these CNVRs ranged from 0.42 to 20.87%. In particular, 
there were 46 CNVRs with frequency >5%, and 8 
CNVRs >10%. Figure 1 summarizes the location and 
characteristics of all CNVRs on autosomal chromo- 
somes. It is obvious that these CNVRs are not uniformly 
distributed among different chromosomes. The propor- 
tion of CNVRs on the 18 pairs of autosomal chromo- 
somes varies from 2.36-12.04%. Chromosome 13 harbors 
the greatest number (46) of CNVRs, whereas chromo- 
some 12 has the densest CNVRs with an average dis- 
tance of l,226.94kb between CNVRs. 

In this study, samples of four populations, including 
119 Yorkshire pigs, 13 Landrace pigs, 15 Songliao Black 
pigs and 327 the Duroc x Erhualian crossbred pigs, were 
used. Large difference of CNVR numbers were found 
among the four populations (Table 1). In the Duroc x 
Erhualian crossbred, we identified 239 CNVRs, which 
comprised 62.57% of the total CNVRs detected herein. 
In Yorkshire, 178 CNVRs were detected, corresponding 
to nearly half of the total number (46.60%), while only 
89 (23.30%) and 101 (26.44%) CNVRs were found in 
Landrace and Songliao Black, respectively. 248 unique 
CNVRs, i.e., CNVRs detected only in one population, 
were detected, including 184, 57, 3 and 4 in Duroc x 
Erhualian crossbred, Yorkshire, Landrace and Songliao 
Black, respectively. 

Gene content of pig CNVRs 

Totally, 1,468 genes within the identified CNVRs were 
retrieved from the Ensembl Genes 64 Database using 
the BioMart data management system [28], including 
1,322 protein-coding genes, 80 miRNA, 29 pseudogenes, 
29 snoRNA, 40 snRNA, 11 rRNA, six miscRNA and one 
retrotransposed gene (Additional file 1; Table S2). These 
genes are distributed in 282 (73.8%) CNVRs, while the 
other 100 CNVRs do not contain any annotated genes. 

In order to provide insight into the functional enrich- 
ment of the CNVs, Gene Ontology (GO) [29] and Kyoto 
Encyclopedia of Genes and Genomes (KEGG) [30] path- 
way analyses were performed with the DAVID bioinfor- 
matics resources [31]. The GO analyses revealed 119 
GO terms (Additional file 1: Table S3), of which 23 were 
statistically significant after Benjamini correction. And 
the significant GO terms were mainly involved in sen- 
sory perception of smell or chemical stimulus, olfactory 
receptor activity, G -protein coupled receptor protein sig- 
naling pathway, cell surface receptor linked signal trans- 
duction, and other basic metabolic processes. There 
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Figure 1 Genomic distribution of CNVRs in 18 pairs of autosomal chromosomes of pigs. The chromosomal locations of 382 CNVRs are 
indicated by lines. Y-axis values are chromosome names, and X-axis values are chromosome position in Mb, which are proportional to real size of 
swine genome sequence assembly (9.0) (http://www.ensembl.org/Sus_scrofa/lnfo/lndex). 



were also some enriched charts with marginal signifi- 
cance, which were involved in antigen processing and 
presentation, MHC class II protein complex, innate im- 
mune response and adaptive immune response. The 
KEGG pathway analyses indicated that the genes in the 
CNVRs were enriched in eight pathways (Additional file 
1: Table S4), of which six were statistically significant 
after Benjamini correction, i.e., olfactory transduction, 
systemic lupus erythematosus, linoleic acid metabolism, 
drug metabolism, arachidonic acid metabolism, and me- 
tabolism of xenobiotics by cytochrome P450. 

Additionally, 360 QTLs (Additional file 1: Table S5), 
affecting a wide range of traits, such as growth, meat 

Table 1 Sample sizes and the CNVR numbers detected in 



the four populations 


Breed 


Sample size 


CNVRs 
number 3 


Unique 
CNVRs b 


Yorkshire 


119 


178 


57 


Landrace 


13 


89 


3 


Songliao Black 


15 


101 


4 


DurocxErhualian crossbred 


327 


239 


184 


Total 


4/4 


382 


248 



3 CNVRs number means the total number found in one breed. 
3 Unique CNVR means CNVR only detected in one breed. 



quality, reproduction, immune capacity and disease re- 
sistance, were found in 16 CNVRs by comparing the 
overlapping of CNVRs with QTLs in the pig QTLdb 
(Jan 2, 2011, (http://www.animalgenome.org/cgi-bin/ 
QTLdb/SS/index)). 

CNV validation by qPCR 

Quantitative real time PCR (qPCR) was used to validate 
18 CNVRs chosen from the 382 CNVRs detected in the 
study. These 18 CNVRs represent different predicted 
status of copy numbers (i.e., loss, gain and both) and dif- 
ferent CNVR frequencies (varied from 0.84 to 18.57%). 
A total of 37 qPCR assays (Additional file 1: Table S6), i. 
e. two or three for every CNVR, were performed. Out of 
the 37 qPCR assays, 21 (56.76%) were in agreement with 
prediction by PennCNV. When counting the CNVRs, 12 
(66.6%) out of the 18 CNVRs (Table 2) had positive 
qPCR confirmations by at least one PCR assay. The 
average frequency and size of the 12 confirmed CNVRs 
were 4.6% and 295. 5kb respectively, which were smaller 
than those of the six unconfirmed ones (8.2% and 
l,034.8kb, respectively) (Additional file 1: Table S6). 

For the CNVRs with low frequencies we tested all the 
positive samples, while for the CNVRs with high fre- 
quencies we tested part of them. Furthermore, a certain 
number of random negative samples were tested as 
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Table 2 Results of quantitative real-time PCR analysis of the 12 confirmed CNVRs 



CNVR Chr. 
No. 



Start 3 



End 3 



Primer 
ID 



Positive samples 



Negative samples 



validated Genes 



Frequency Type Number Confirmed Confirmed Number Confirmed Confirmed 
of samples rate of samples rate 
samples sample 
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' The Sus scrofa assembly (9.0) (http://www.ensembl.org/Sus_scrofa/lnfo/lndex) was used to indicate the position of the CNVRs. 



negative control for every CNVR. For the positive sam- 
ples of the 12 confirmed CNVRs, the proportions of 
confirmed samples varied from 68.42% to 100%, with an 
average of 92.69%. For the negative samples of the 12 
confirmed CNVRs, the proportions of confirmed sam- 
ples (i.e. false negative) varied from 0 to 72.73%, with an 
average of 31.82% (Table 2). Additionally, the copy num- 
bers in some CNVRs varied among individuals. For ex- 
ample, we found one copy loss and different copy gain 
(three to six copies) in CNVR22 (Figure 2), and one and 
two copies loss in CNVR373 (Figure 3). 

Discussion 

In our study, among the four populations, the largest 
number of total CNVRs and unique CNVRs were 
detected in the Duroc x Erhualian crossbred population. 
In addition to the larger sample size, another important 



reason is that this population has special genetic back- 
ground. Particularly, Erhualian is one famous Chinese 
indigenous breed. Many previous studies have indicated 
that Chinese indigenous pig breeds have different gen- 
etic background with western commercial breeds, such 
as Duroc, Landrace and Yorkshire [32-35]. Therefore, 
there are breed-specific CNVs in pigs, which is consist- 
ent with the report in cattle [7]. The differences of CNV 
among breeds supported that some CNVs are likely to 
generate independently in breeds and therefore, likely 
contribute to breed differences. 

We compared our results with two previous reports on 
pig CNVs (Additional file 1: Table S7). Ramayo-Caldas 
et al. [10] firstly used the Porcine SNP60 BeadChip data 
of 55 animals from an Iberian x Landrace cross to iden- 
tify CNVs in pig, and detected 49 CNVRs by at least two 
programs of cnvPartition (Illumina Inc.), PennCNV [27] 
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Sample names used in quantitative real time PCR for CNVR22 

Figure 2 Normalized ratios (NR) obtained by Quantitative real time PCR (qPCR) for CNVR22. Y-axis shows the NR values obtained by qPCR, 
and X-axis represents the sample name of the detected 33 positive and 1 1 negative control samples. Samples with NR about 1 denote normal 
individuals (two copy), samples with NR about 0.5 denote one copy loss individuals (one copy), and samples with NR about 1.5 or more denote 
copy number gain individuals (iSthree copy). 



and GAD A [36]. Twenty- two out of the 49 CNVRs 
(44.9%) are identical or overlapped with our results. 
Using the custom tiling oligonucleotide array CGH ap- 
proach, Fadista et al. [9] addressed 37 CNVRs on the 
SSC4, 7, 14, and 17 of the preliminary assembly of pig 
genome among 12 Duroc boars. However, only one 
CNVR of them was found overlapping with our results. 

The potential reasons for the different results between 
this study and the other two studies lie in the following 
aspects. Firstly, the study population differed in terms of 
size and genetic background in different studies. A much 
larger sample size with broader genetic background 
(three pure breeds and one crossbred population) were 
included in this study in comparison with the other two 
studies, where only one breed or crossbreed (different 
from ours) with very small sample size were involved. 



Secondly, different platforms, SNP genotyping array and 
CGH array, are different in the calling technique, reso- 
lution difference and genome coverage which contribute 
to the discrepancy of CNVs detected. Thirdly, previous 
studies showed that genomic waves have a significant 
interfere with accurate CNV detection [8,37]. Genomic 
wave refers to the patterns of signal intensities across all 
chromosomes, where different samples may show highly 
variable magnitude of waviness. In our study, the gen- 
omic waves were adjusted using the -gcmodel option, 
while it was not in the study of Ramayo-Caldas et al. 
[10]. The issue of low overlapping rates between differ- 
ent reports was also encountered in CNV studies in 
other mammal [7,8,38,39]. 

A large amount annotated genes (1,468 Ensembl 
genes) are located in the 382 identified CNVRs. The 
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Sample names used in quantitative real time PCR for CNV373 

Figure 3 Normalized ratios (NR) obtained by quantitative PCR (qPCR) for CNVR372. Y-axis shows the NR values obtained by qPCR, and 
X-axis represents the sample name of the detected 28 positive and 8 negative control samples. Samples with NR about 1 denote norma 
individuals (2 copy), and samples with NR about 0 and 0.5 denote one copy and two copies loss individuals (zero and one copy). 
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average number of genes per Mb of the 382 CNVRs is 
15.32, which is larger than that on the whole genome 
(9.05) according to the Sscrofa 9.0 assembly in Ensembl 
(http://asia.ensembl.org/). It has been suggested that 
CNVs are located preferably in gene-poor regions 
[40,41], probably because CNVs present in gene-rich 
regions may be deleterious and therefore removed by 
purifying selection [42]. In contrast to it, the larger 
number of genes in the identified CNVRs probably 
reflects the fact that the Porcine SNP60 BeadChip used 
in this study is biased toward the gene-rich regions. 
Functional analyses, such as GO, pathway and overlap- 
ping with QTLs in pig QTLdb, suggest that these genes 
entail a great variety of molecular functions, making 
them a promising resource for exploring the genetic 
basis of phenotypic variation within and among breeds. 
Especially, consistent with CNV studies in human, 
mouse, cattle, and dog [1,5,7,21], some of the enriched 
GO terms, such as drug detoxification, innate and 
adaptive immunity, and receptor and signal recognition, 
are also present in pigs. Conservation of some CNVs 
across different species suggests that selective pressure 
may tend to favor specific gene dosage changes, and 
genes involved in these CNVs may affect the adaptabil- 
ity and fitness of an organism in response to external 
pressures [1]. 

Most of our CNVRs were reported for the first time. 
In order to confirm these novel CNVRs, we selected 18 
CNVRs for validation by qPCR, and 12 of them (66.6%) 
were validated. The confirmed rate is higher than most 
of previously reported, such as Fadista et al. [9] in pigs 
(50%) and Hou et al. [8] in cattle (60%) but a little lower 
than that reported by Ramayo-Caldas et al. [10] in pigs 
(71%). In the study of Ramayo-Caldas et al. [10], the 
CNVRs selected to be validated were detected by at least 
two programs and were of high frequency, whereas 
CNVRs selected to be validated herein were detected by 
one program, with low to high frequencies. The average 
proportion of the confirmed positive samples of the 12 
validated CNVRs were 92.69%, demonstrating that for 
most of the positive samples qPCR experiments agreed 
well with the PennCNV prediction, whereas the false 
negative rate in the negative samples were rather high, 
with an average of 31.82%. False-negative identification 
is common in CNV detection, and has been reported 
previously [9,10,21]. It can be explained by the stringent 
criteria of CNV detection, i.e., containing three or more 
consecutive SNPs and presented in at least two indivi- 
duals, which were applied in order to minimize the 
false-positive, and thus resulted in high false-negative 
rate inevitably. 

Eight out of the 12 successfully validated CNVRs 
contain functionally important genes. Three of them 
(CNVRJD: 22, 276 and 373) include genes of olfactory 



receptors (ORs) family. ORs are involved in odorant 
recognition and form the largest mammalian protein 
superfamily [43]. Many studies in human and other 
mammals also indicate that the OR genomic loci are 
frequently affected by CNVs [2,4,5,40,43,44]. The qPCR 
assays revealed that all of the three CNVRs could be 
confirmed by two pairs of primers. The other five 
CNVRs (CNVR_No 20, 259, 314, 325, 344) contain 
many important immune-related and basic metabolic 
genes, including TNF receptor-associated factor 1 
(TRAF1), EGF containing fibulin-like extracellular matrix 
protein 2 (EFEMP2), D4, zinc and double PHD fingers 
family 2 (DPF2), CD4 molecule (CD4), glyceraldehyde-3- 
phosphate dehydrogenase (GAPDH), ferritin, light poly- 
peptide (FTL) and interferon regulatory factor 3 (IRF3). 
The functions of these genes have been reported in pig 
and other species, and their detailed information was 
showed in Table S9 of the Additional file 1. In particular, 
CD4 was the first time to be found to have copy number 
change not only in pigs but in human and other animals. 
Considering the important function of genes in them, 
the five CNVRs are worth to be further studied. 

The Porcine SNP60 BeadChip was originally devel- 
oped for high-throughput SNP genotyping for genome- 
wide association studies. Although CNV detection is also 
feasible with such panel, it is impaired by low marker 
density, non-uniform distribution of SNPs along pig 
chromosomes and lack of non-polymorphic probes spe- 
cifically designed for CNV identification [45]. Hence, 
only large CNVRs are expected to be assessed with the 
Porcine SNP60 array. Furthermore, the Sscrofa 9 assem- 
bly, with 4x sequence depth across the genome, is still 
in incomplete status, which makes it difficult to deter- 
mine the boundaries of CNVRs. Accordingly, multiple, 
neighboring, and discrete CNV events could trigger a 
larger call by PennCNV, leading to an over-estimation of 
the CNV size. Therefore, it is quite possible that the 
qPCR primers used to validate the CNVRs were 
designed beyond the boundaries of the CNVRs. Besides 
these aspects, factors, such as potential SNPs and small 
indels undetected so far, could also influence the 
hybridization of the qPCR primers in some animals, 
resulting in unstable quantification values or reducing 
primer efficiency. 

Many gene families, including olfactory receptor, sol- 
ute carrier, cytochrome P450, MHC and interleukin, 
which had been reported to be influenced by CNVs in 
human and other mammals [10,44,46], were also found 
to be in the CNVRs of this study. Additionally, by con- 
verting the pig Ensembl gene IDs to their orthologous 
human gene, we checked whether they have been 
included in the Human Database of Genomic Variants 
(http://projects.tcag.ca/variation/). It turned out that 590 
genes (Additional file 1: Table S2), a remarkably high 
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proportion (40.19%) of all the total number genes in the 
identified CNVRs, were reported to be influenced by 
CNVs in human. 

Conclusions 

We have performed a genome-wide CNV detection based 
on the Porcine SNP60 genotyping data of 474 pigs and 
provided the highest resolution CNV map in the pig gen- 
ome so far. A total of 382 CNVRs were identified. Validat- 
ing of 18 CNVRs of these CNVRs by qPCR assays 
produced a high rate (66.67%) of confirmation. We con- 
clude that the currently available genome-wide SNP assays 
can capture CNVs efficiently. However, it should be 
noticed that only large CNVRs are expected to be identi- 
fied using this SNP panel and the number of CNVs identi- 
fied in this study is likely to be a gross underestimation of 
the true number of CNVs in the pig genome. Follow-up 
studies, using improved SNP arrays as well as other tech- 
nologies, such as CGH arrays and next-generation se- 
quencing [47], should be carried out to attain high- 
resolution CNV map. Association studies between CNVs 
and diseases have become popular in human [15-17], and 
have begun in animal as well [48]. Findings in our study 
would provide meaningful genomic variation information 
for association studies between CNV and economically 
important phenotypes of pigs in the future. 

Methods 

Animal resource 

The animals initially used in this study were composed of 
1,017 pigs from four populations with different genetic 
background, including 500 Yorkshire pigs, 85 Landrace 
pigs, 96 Songliao Black pigs, and 336 Duroc x Erhualian 
crossbred pigs. Songliao Black is a breed derived from 
cross of Landrace, Duroc and Min pigs. The Duroc x 
Erhualian crossbred was formed by crossing eight Duroc 
boars with 18 Erhualian sows. Both Min pigs and Erhua- 
lian pigs are Chinese indigenous breeds. 

SNP array genotyping and quality control 

Genomic DNA samples were extracted from ear tissue of 
all pigs using a standard phenol/chloroform method. All 
DNA samples were analyzed by spectrophotometry and 
agarose gel electrophoresis. The genotyping platform 
used was Infinium II Multisample assay (Illumina Inc.). 
SNP arrays were scanned using iScan (Illumina Inc.) and 
analyzed using BeadStudio (Version 3.2.2, Illumina, Inc.). 
The whole procedure for collection of the ear tissue sam- 
ples was carried out in strict accordance with the proto- 
col approved by the Animal Welfare Committee of China 
Agricultural University (Permit number: DK996). 

In order to exclude poor-quality DNA samples and de- 
crease potential false-positive CNVs, quality control was 
performed according to the following procedures. The 



genome-wide intensity signal must have as little noise as 
possible. Only those samples with standard deviation of 
normalized intensity (Log R ratio, LRR) <0.30 and B al- 
lele frequency (BAF) drift <0.01 were included. Since 
wave artifacts roughly correlating with GC content 
resulting from hybridization bias of low full-length DNA 
quantity could interfere with accurate inference of CNVs 
[37], only samples in which the GC wave factor of LRR 
less than 0.05 were accepted. Finally, 474 samples (119 
Yorkshire pigs, 13 Landrace pigs, 15 Songliao Black pigs 
and 327 Duroc x Erhualian crossbred pigs) with high- 
quality genotyping (average call rate 99.67%) out of 
1,017 samples were remained for CNV detection after 
quality control. 

Identification of pig CNVs 

The PennCNV software [27] was applied to identify pig 
CNVs in this study. This algorithm incorporates multiple 
sources of information, including total signal intensity (LRR) 
and allelic intensity ratio (BAF) at each SNP marker, the dis- 
tance between neighboring SNPs, the population frequency 
of B allele (PFB) of SNPs, and the pedigree information 
where available [27]. Both LRR and BAF were exported from 
BeadStudio (Illumina Inc.) given the default clustering file 
for each SNP. The PFB file was calculated based on the BAF 
of each marker. The SNPs physical positions on chromo- 
somes were derived from the swine genome sequence as- 
sembly (9.0) (http://www.ensembl.org/Sus_scrofa/Info/ 
Index). Furthermore, PennCNV also integrates a computa- 
tional approach by fitting regression models with GC con- 
tent to overcome "genomic waves". The pig gcmodel file 
was generated by calculating the GC content of the 1Mb 
genomic region surrounding each marker (500kb each 
side) and the genomic waves were adjusted using the 
-gcmodel option. Although many of the samples had pedi- 
gree information initially, most of trio information was un- 
available after quality control. So, pedigree/ trio 
information was not incorporated into the analyses. 

In this study, CNV was inferred with two criteria: first, 
it must contain three or more consecutive SNPs, and sec- 
ond it must be present in at least two individuals. Finally, 
CNVs regions (CNVRs) were determined by aggregating 
overlapping CNVs identified across all samples according 
to the criteria proposed by Redon et al. [2]. 

Due to density limitation of SNPs on chromosome X, i. 
e. about 86kb of averaged SNP interval, which is two folds 
of the average interval across whole genome, CNVs 
detected on chromosome X might have high false-positive 
rate and were excluded from further analyses in our study. 

Gene contents and functional annotation 

Gene contents in the identified CNVRs were retrieved 
from the Ensembl Genes 64 Database using the BioMart 
(http://www.biomart.org/) data management system 
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[28]. To provide insight into the functional enrichment 
of the CNVs, functional annotation was performed with 
the DAVID bioinformatics resources 6.7 (http://david. 
abcc.ncifcrf.gov/summary.jsp) [31] for Gene Ontology 
(GO) terms [29] and Kyoto Encyclopedia of Genes and 
Genomes (KEGG) [30] pathway analyses. Since only a 
limited number of genes in the pig genome have been 
annotated, we firstly converted the pig Ensembl gene 
IDs to orthologous mouse Ensembl gene IDs by BioMart 
(Additional file 1: Table S8), then carried out the GO 
and pathway analyses. Statistical significance was 
assessed by using P value of a modified Fisher's exact 
test and Benjamini correction for multiple testing. 

Quantitative real time PCR 

Quantitative real time PCR (qPCR) was used to validate 18 
CNVRs chosen from the 382 CNVRs detected in the study. 
We used the 2" AACt method for relative quantification of 
CNVs [49], which compares the AC t (cycle threshold (C t ) 
of the target region minus Q of the control region) value 
of samples with CNV to the AC t of a calibrator without 
CNV. The glucagon gene (GCG) is highly conserved be- 
tween species and has been approved to have a single copy 
in animals [10,50]. So, one segment of it was chosen as 
control region. Primers (Table S6 of Additional file 1) were 
designed with the Primer3 web tool (http://frodo.wi.mit. 
edu/primer3/). Moreover, the UCSC In-Silico PCR tool 
(http:// genome.ucsc.edu/ cgi-bin/hgPcr?command=start) 
was used for in silico specificity analysis [51]. Prior to per- 
forming the copy number assay, we generated standard 
curves for the primers of target and control regions to de- 
termine their PCR efficiencies. To ensure the same amplifi- 
cation efficiencies between target and control primers, the 
PCR efficiencies for all primers used in the study were 
required to be 1.95-2.10. 

All qPCR were carried out using LightCycler® 480 
SYBR Green I Master on Roche LightCycler® 480 instru- 
ment following the manufacturer's guidelines and cycling 
conditions. The reactions were carried out in a 96-well 
plate in 20ul volume, containing 10ul Blue-SYBR-Green 
mix, lul forward and reverse primers (lOpM/ul) and lul 
20ng/ul genomic DNA. Each sample was analyzed in 
duplicates. The second derivative maximum algorithm 
included within the instrument software was used to de- 
termine cycle threshold (Q) values for each region. 

Additional files 



Information and the primers used in qPCR analyses of the 18 CNVRs 
chosen to be validated. Additional file 1: Table S7. Comparison between 
identified CNVRs and those of previous reports of pig CNVs. Additional 
file 1: Table S8. Pig Ensembl gene IDs and their orthologous mouse IDs. 
Additional file 1: Table S9. Functions of the genes validated to be copy 
number variable by qPCR assay [52-65]. 
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